# Licensing and Attribution for This Repository

This repository includes both code and datasets, under various licenses. Please review the following information to ensure compliance when reusing or redistributing this work.

---

## Code

All original code in this repository is licensed under the **Apache License 2.0**. See the `LICENSE` file in the root of the repository for full text.

### Included Third-party Code
This repository may include third-party code, for example as subrepositories or vendor directories. Each such component may come with its own license, which governs its use.

- **MASK repository**  
  Some code is reused from the [MASK repository](XXXX), which is licensed under the **MIT License**.  
  This code is included as a subrepository, along with its own `LICENSE` file.
  Copyright (c) 2025 centerforaisafety

- **Deception-detection / Insider-trading**  
  Some code is adapted from the [Deception-Detection repository](XXXX) by Apollo Research.
  The licensing status of this code is unclear.  
  We include it under the assumption of academic fair use and provide full attribution to Apollo Research.

---

## Datasets

Unless otherwise specified, all datasets created by the authors of this repository are licensed under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**.

The following datasets include third-party components and are subject to additional licensing terms:

### 1. Harm-Pressure Datasets
- Includes data from the **WMDP** dataset, licensed under the **MIT License**  
  Copyright (c) 2024 centerforaisafety 
  XXXX

### 2. Mask Dataset
- Includes prompts from the **MASK** dataset, licensed under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**  
  Copyright (c) 2025 centerforaisafety 
  XXXX

### 3. Soft-Trigger Dataset
- Includes questions from the **BoolQ** dataset, licensed under the **Creative Commons Attribution 3.0 Unported License (CC BY 3.0)**  
  Copyright (c) 2019 Google 
  XXXX

### 4. Alpaca Dataset
- Includes prompts from the **Alpaca dataset**, licensed under the **MIT License**
  Copyright (c) 2024 Stanford Center for Research on Foundation Models
  XXXX

### 5. Instructed-Deception Dataset
- Includes data derived from the dataset released by **Azaria & Mitchell (2023)**, from the paper  
  *"The Internal State of an LLM Knows When It's Lying"*. The licensing status of this dataset is unclear.  
  We include this data under the assumption of academic fair use and provide full attribution:
  > Azaria, A. & Mitchell, M. (2023). The Internal State of an LLM Knows When It's Lying.  
  > *Findings of the Association for Computational Linguistics: EMNLP 2023*.  
  > Dataset available at: XXXX

### 6. Insider-Trading Dataset
- Includes prompts from the paper “Large Language Models can Strategically Deceive their Users when Put Under Pressure” by Apollo Research
- Licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0)
  Copyright (c) 2024 Apollo Research
  XXXX

---

## License for Our Original Datasets

All other datasets are created by the authors of this repository and are licensed under the **Creative Commons Attribution 4.0 International License (CC BY 4.0)**.

License text: XXXX

---

## Attributions Summary

We provide credit and preserve the license terms for all reused datasets and code in accordance with their respective licenses. If you redistribute or adapt this repository, please do the same.
